<html>
<head><meta charset="utf-8"><title>UTF8 footgun · general · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/index.html">general</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html">UTF8 footgun</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="211950320"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211950320" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211950320">(Oct 01 2020 at 18:06)</a>:</h4>
<p>Footgun of the day: non-printed UTF8 characters. I just managed to shoot myself with this one: <a href="https://github.com/rust-lang/rust/issues/77417">https://github.com/rust-lang/rust/issues/77417</a> - I copied one of those "&shy;" characters into my code, which wasn't printed, so I thought I found a compiler bug. Lovely, do anybody see a way to avoid these kinds of mistakes?</p>



<a name="211950407"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211950407" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211950407">(Oct 01 2020 at 18:07)</a>:</h4>
<p>Some editors will highlight those character for you</p>



<a name="211954601"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211954601" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211954601">(Oct 01 2020 at 18:36)</a>:</h4>
<p>Thanks, I didn't know that. Unfortunately, VS Code doesn't seem to. Anyway, what would people think about a "suspicious string" lint for this? For clippy, for example? (Is it even possible to get the original byte sequence for a string literal?)</p>



<a name="211955543"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211955543" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211955543">(Oct 01 2020 at 18:43)</a>:</h4>
<p><a href="https://stackoverflow.com/questions/7560177/highlighting-and-replacing-non-printable-unicode-characters-in-emacs">https://stackoverflow.com/questions/7560177/highlighting-and-replacing-non-printable-unicode-characters-in-emacs</a></p>



<a name="211964452"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211964452" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211964452">(Oct 01 2020 at 19:48)</a>:</h4>
<p>Can't really speak to the linting idea. Seems like you are getting the feedback<br>
from the compiler as it is. What was the error message you get?</p>



<a name="211964699"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211964699" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211964699">(Oct 01 2020 at 19:50)</a>:</h4>
<p>No error message, I just got an unexpected binary because I didn't notice the non-printed chars in the source.</p>



<a name="211964880"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211964880" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211964880">(Oct 01 2020 at 19:51)</a>:</h4>
<p>It's not unreasonable that it would error if it's producing unexpected results</p>



<a name="211965030"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211965030" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211965030">(Oct 01 2020 at 19:52)</a>:</h4>
<p>But Rust fully supports UTF8 in strings, so it's not invalid - just extremely hard to notice</p>



<a name="211965920"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211965920" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211965920">(Oct 01 2020 at 19:59)</a>:</h4>
<p>\302\255\302\255</p>



<a name="211966918"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211966918" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> oliver <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211966918">(Oct 01 2020 at 20:07)</a>:</h4>
<p>I actually thought it was a different issue without looking at your link</p>



<a name="211966931"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/211966931" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#211966931">(Oct 01 2020 at 20:07)</a>:</h4>
<p>I suspected a parse error up until the point where I tried an other character code and noticed that only one of the \u{ad}'s changed... Hence the footgun, but I'm not sure if you or others would agree. I managed to play myself, after all.</p>



<a name="212010995"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/212010995" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#212010995">(Oct 02 2020 at 06:30)</a>:</h4>
<p>Linting on invisible characters sounds logical to me; one can always use escapes to put those in.  Or use <code>include_str!</code> (which the lint should probably not look at).</p>



<a name="212041924"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/212041924" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#212041924">(Oct 02 2020 at 07:42)</a>:</h4>
<p>I'm not (yet) sure how <code>include_str!</code> works and would interact with the lint, but we need to look at the string at the source level (i.e. what the user actually typed), since escape sequences seem to be parsed immediately.</p>
<p>I would argue that a check like this would be most useful as a compiler warning, and that it probably is expensive enough to not be that.</p>



<a name="212057715"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/122651-general/topic/UTF8%20footgun/near/212057715" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Dániel Buga <a href="https://rust-lang.github.io/zulip_archive/stream/122651-general/topic/UTF8.20footgun.html#212057715">(Oct 02 2020 at 10:33)</a>:</h4>
<p>I've opened a PR against Clippy. Thanks for your input.</p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>